Add temporal Gemini video analysis and CloudEvents publishing pipeline by Copilot · Pull Request #12 · groupthinking/EventRelay

Copilot · 2026-01-28T06:36:21Z

Implements the direct YouTube URL → Gemini analysis → structured events → EventMesh/OpenWhisk execution path, including timestamped temporal extraction and JSON-structured outputs. Adds APIs and documentation to expose these capabilities end-to-end.

Ingestion & temporal understanding
- Adds temporal video analysis module with timestamp-focused prompt strategies and segment controls.
- Exposes temporal analysis endpoints for event extraction and timeline generation.
Structured output
- Extends Gemini structured response handling to return JSON event payloads via a dedicated API endpoint.
Eventing & execution
- Introduces CloudEvents publisher with EventMesh-style HTTP publishing and OpenWhisk action routing.
Surface area
- Adds advanced API routes, unit tests for temporal analysis and CloudEvents, and updated docs/examples.

Example (structured event extraction request):

payload = {
    "video_url": "https://www.youtube.com/watch?v=auJzb1D-fag",
    "prompt": "Describe key events and include timestamps",
    "response_schema": {
        "type": "object",
        "properties": {
            "events": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "timestamp": {"type": "string"},
                        "event": {"type": "string"}
                    },
                    "required": ["timestamp", "event"]
                }
            }
        },
        "required": ["events"]
    }
}

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

vercel · 2026-01-28T06:36:27Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Review	Updated (UTC)
event-relay-web	Error		Jan 28, 2026 6:56am
v0-uvai	Error		Jan 28, 2026 6:56am

Co-authored-by: groupthinking <154503486+groupthinking@users.noreply.github.com>

src/integration/cloudevents_publisher.py

+            **kwargs
+        )
+
+        logger.info(f"Publishing CloudEvent: type={type}, id={event.id}")


General fix strategy: sanitize or normalize any user-controlled string before including it in log messages, at least by removing or replacing carriage return and newline characters, and ideally by clearly marking or restricting user input. Here, the minimal, non-breaking fix is to sanitize type / event_type prior to logging without changing the parameters passed to the rest of the system.

Best concrete fix in this codebase:

In CloudEventsPublisher.publish, derive a sanitized version of the type argument (e.g., by replacing \r and \n with empty strings) and use that in the log message instead of the raw type. This keeps the actual CloudEvent unchanged (it still uses the original type argument) and only affects the log output.

Implement the sanitization inline in the logging call (or just above it), avoiding changes to method signatures or additional imports. The standard str.replace method is sufficient and does not require new dependencies.

Specific change:

File: src/integration/cloudevents_publisher.py

In the publish method of CloudEventsPublisher, change line 171 from:

logger.info(f"Publishing CloudEvent: type={type}, id={event.id}")

To use a sanitized local variable, for example:

safe_type = str(type).replace("\r", "").replace("\n", "")

logger.info(f"Publishing CloudEvent: type={safe_type}, id={event.id}")

No changes are required in advanced_video_routes.py for this particular log injection issue, since the problematic sink is in the publisher.

This keeps functionality intact (the event published is identical), and only the logged representation of type is sanitized to prevent log injection.

src/integration/cloudevents_publisher.py

+            elif self.backend == "file":
+                return await self._publish_file(event)
+            else:
+                logger.error(f"Unsupported backend: {self.backend}")


In general, to fix log injection when logging user-controlled data, either (a) validate the value and restrict it to a safe, known set of options before logging, or (b) sanitize the value to remove or neutralize characters that can break log structure (such as \r and \n) or other control sequences. It is preferable to validate to a whitelist of allowed values when the domain is small and known—as is the case for backend.

For this specific case, the simplest way to fix the problem without changing existing functionality is to avoid logging the raw self.backend value when it is not one of the known backends, and instead log a sanitized or redacted representation. Since the set of valid backends is already encoded in the Literal type and in the subsequent if/elif chain, we can compute a “safe” version of the backend string that strips newline characters before logging. This keeps the informational content (“what backend was requested”) without allowing injection of extra log lines. Concretely, inside the publish method of CloudEventsPublisher, we will introduce a local variable safe_backend in the unsupported-backend branch, computed from self.backend using .replace('\r', '').replace('\n', ''), and use that in the log message instead of self.backend.

We only need to modify src/integration/cloudevents_publisher.py. No changes are required in advanced_video_routes.py because the fix is at the sink (the logging call). The change is localized to the else branch at lines 182–184 (around the logger.error(f"Unsupported backend: {self.backend}") call). No new imports or helper methods are required.

src/integration/cloudevents_publisher.py

+            response.raise_for_status()
+
+            logger.info(
+                f"Published CloudEvent {event.id} to OpenWhisk trigger: {trigger_name}"


In general, to fix log injection, any user-provided value written to logs should be normalized so that it cannot inject new log entries or otherwise alter the log format. For plain-text logs, this typically means stripping carriage returns and newlines, and optionally other non-printable control characters, from user input before including it in log messages. The underlying functionality (publishing events) should remain unchanged; only the logging message needs to use a sanitized representation of the user input.

For this specific issue, we should ensure that trigger_name used in the log message in _publish_openwhisk is sanitized. The best minimal-change approach is:

Keep using the original trigger_name value for constructing the OpenWhisk URL and making the HTTP request (so runtime behavior is unchanged).

Introduce a sanitized version of the string that removes \r and \n (and optionally other control characters) right before logging.

Log the sanitized value instead of the raw trigger_name.

Concretely, in src/integration/cloudevents_publisher.py:

Inside _publish_openwhisk, after trigger_name is computed and the HTTP call succeeds, create a safe_trigger_name variable derived from trigger_name with line breaks removed (e.g., trigger_name.replace("\r", "").replace("\n", "")).

Change the logger.info call on line 284–286 to interpolate safe_trigger_name instead of trigger_name.

No additional imports or global helpers are strictly necessary for this small scope; we can do the sanitization inline.

examples/complete_workflow_example.py

+"""
+
+import asyncio
+import json


To fix an unused import, remove the import statement for the module that is never referenced in the file. This simplifies dependencies, speeds up startup slightly, and removes the static analysis warning.

In this case, the best minimal fix is to delete the line import json from examples/complete_workflow_example.py. No other code changes are needed because nothing in the file references json. Specifically, remove line 15 while keeping the surrounding imports (asyncio, os, sys, Path) unchanged.

examples/complete_workflow_example.py

+
+import asyncio
+import json
+import os


In general, unused import issues are fixed by either removing the import or using the imported symbol if it was unintentionally omitted from the code. Here, the best fix that does not change existing behavior is to delete the unused import os line.

Specifically, in examples/complete_workflow_example.py, remove the line import os at line 16 and leave the remaining imports unchanged. No new methods, imports, or definitions are needed; this is a simple cleanup and does not affect runtime behavior of the script.

src/integration/temporal_video_analysis.py

+        try:
+            if isinstance(result.summary, str) and result.summary.strip().startswith("{"):
+                return json.loads(result.summary)
+        except json.JSONDecodeError:


In general, empty except blocks should either handle the error meaningfully (e.g., logging, metrics, recovery) or narrow their scope. Here, we can keep the existing functional behavior—falling back to a simple dictionary with the raw summary—while adding logging to record that JSON parsing failed.

The best minimal fix is to replace the pass in the except json.JSONDecodeError: block around line 347 with a logging statement, similar in style to the one already used in extract_tutorial_steps. We can log a warning that includes a short message and optionally a truncated version of result.summary for context. No new imports are needed because logger is already defined at the top of the file and json is already imported.

Concretely:

In src/integration/temporal_video_analysis.py, in the method that returns the comparison analysis (around lines 337–350), update the except json.JSONDecodeError: block so that it logs a warning instead of silently passing.

Keep the existing fallback return {"comparison": result.summary} unchanged so that behavior seen by callers does not change.

tests/unit/test_cloudevents_publisher.py

+
+import json
+import os
+from datetime import datetime, timezone


To fix an unused import, remove the unused symbol from the import statement (or the entire import if nothing from it is used). Here, datetime is used and timezone is not, so we should keep importing datetime while dropping timezone.

Concretely, in tests/unit/test_cloudevents_publisher.py, locate the line from datetime import datetime, timezone (line 9) and modify it to import only datetime: from datetime import datetime. No other code changes are required, since there are no references to timezone in the provided code. This removes the unnecessary dependency and resolves the CodeQL warning without affecting existing functionality.

sentry · 2026-02-06T14:49:00Z

src/youtube_extension/backend/api/advanced_video_routes.py

+        result = await service.generate_content_async(
+            contents,
+            response_schema=request.schema
+        )


Bug: The code calls service.generate_content_async(), but this method does not exist on the GeminiService object, which will cause a runtime AttributeError.
_{Severity: CRITICAL}

Suggested Fix

Replace the call to the non-existent generate_content_async() method with a suitable existing async method from GeminiService, such as process_youtube(), process_text(), or process_video(), which supports structured output via the response_schema parameter.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/youtube_extension/backend/api/advanced_video_routes.py#L402-L405 Potential issue: The `/analyze/structured` endpoint at `advanced_video_routes.py` attempts to call `service.generate_content_async()`. However, the `GeminiService` class does not have a method with this name. When this endpoint is invoked, the application will raise an `AttributeError: 'GeminiService' object has no attribute 'generate_content_async'`, causing the request to fail. The developer likely intended to use an existing method such as `process_youtube()` which supports the `response_schema` parameter.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

sentry · 2026-02-06T14:49:01Z

src/youtube_extension/backend/api/advanced_video_routes.py

+                    data={
+                        "timestamp": evt.timestamp,
+                        "description": evt.description,
+                        "confidence": evt.confidence,
+                        "metadata": evt.metadata
+                    },
+                    subject=request.video_url
+                )
+                if event_id:
+                    published_ids.append(event_id)
+            await publisher.close()


Bug: The CloudEventsPublisher is not closed within a finally block, causing a resource leak if an exception occurs during event publishing.
_{Severity: HIGH}

Suggested Fix

Refactor the code to use a try...finally block. Initialize the CloudEventsPublisher to None before the try block, create it inside the try, and call await publisher.close() within the finally block if the publisher object was successfully created. This ensures resource cleanup occurs even if an exception is raised.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/youtube_extension/backend/api/advanced_video_routes.py#L188-L203 Potential issue: In the `/temporal/events`, `/analyze/structured`, and `/publish-event` endpoints, a `CloudEventsPublisher` is created within a `try` block. If an exception occurs after the publisher is created but before `publisher.close()` is called (e.g., during a `publisher.publish()` network request), the `except` block is entered and the `close()` method is never awaited. This leaks the underlying `httpx.AsyncClient`, which can lead to resource exhaustion (file descriptors, memory) over time.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

sentry · 2026-02-06T14:49:01Z

src/integration/cloudevents_publisher.py

+            )
+
+            # Basic auth
+            username, password = self.openwhisk_auth.split(":")


Bug: The OpenWhisk auth string is parsed using split(":"), which fails if the password contains a colon, causing the publishing feature to fail silently.
_{Severity: MEDIUM}

Suggested Fix

Change the parsing logic from self.openwhisk_auth.split(":") to self.openwhisk_auth.split(":", 1). This will ensure the string is split only on the first colon, correctly separating the username from a password that may contain subsequent colons.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: src/integration/cloudevents_publisher.py#L274 Potential issue: The code for parsing the OpenWhisk authentication string in `cloudevents_publisher.py` uses `self.openwhisk_auth.split(":")`. This will raise a `ValueError` if the password contains a colon (a valid scenario for HTTP Basic Auth) or if the auth string is misconfigured and contains no colon. While the exception is caught and logged, it causes the OpenWhisk publishing feature to silently fail for users with valid credentials that include a colon in the password.

_{Did we get this right? 👍 / 👎 to inform future reviews.}

Copilot

Pull request overview

Adds “advanced video analysis” capabilities on top of the existing YouTube→Gemini pipeline: timestamp-focused (temporal) extraction, schema-constrained (structured) outputs, and CloudEvents publishing to external execution backends.

Changes:

Introduces TemporalVideoAnalyzer for segment/timeline/event extraction and temporal Q&A.
Adds CloudEventsPublisher with Pub/Sub, HTTP webhook, OpenWhisk, and file backends.
Adds a new FastAPI router with endpoints for temporal analysis, structured analysis, and event publishing; plus accompanying tests/docs/examples.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 23 comments.

Show a summary per file

File	Description
`src/integration/temporal_video_analysis.py`	New temporal prompt strategies + helpers (segments/events/timeline/etc.).
`src/integration/cloudevents_publisher.py`	New CloudEvents v1.0 model + multi-backend publisher.
`src/youtube_extension/backend/api/advanced_video_routes.py`	New API surface for temporal/structured analysis and CloudEvents publishing.
`tests/unit/test_temporal_video_analysis.py`	Unit tests for the temporal analyzer.
`tests/unit/test_cloudevents_publisher.py`	Unit tests for CloudEvents serialization + each backend.
`docs/ADVANCED_VIDEO_FEATURES.md`	Long-form docs for the new feature set.
`docs/API_QUICK_REFERENCE.md`	Copy/paste API examples for the new endpoints.
`docs/IMPLEMENTATION_SUMMARY.md`	Implementation status report and activation steps.
`examples/complete_workflow_example.py`	End-to-end example script combining temporal extraction + publishing.
`README.md`	API Reference updated to list new endpoints/docs.
`IMPLEMENTATION_COMPLETE.md`	“Implementation complete” marker doc + next steps.

Copilot · 2026-02-06T14:53:48Z

examples/complete_workflow_example.py

+
+    video_url = "https://youtube.com/watch?v=dQw4w9WgXcQ"
+


This example uses the banned test video ID dQw4w9WgXcQ. Replace it with the repo’s standard test video ID (auJzb1D-fag) to avoid violating test-data conventions and automated checks.

Copilot · 2026-02-06T14:53:48Z

docs/ADVANCED_VIDEO_FEATURES.md

+# YouTube URL (direct)
+result = await service.analyze_video(
+    video_url="https://youtube.com/watch?v=dQw4w9WgXcQ",
+    prompt="Analyze this video",
+    media_resolution="high",  # Use 'high' for text-heavy content
+    thinking_level="high"     # Use 'high' for complex reasoning
+)


This doc uses the banned test video ID dQw4w9WgXcQ in examples. Replace it with auJzb1D-fag (the repo’s required default test video) throughout this document to stay consistent with test-data policy.

Copilot · 2026-02-06T14:53:49Z

src/youtube_extension/backend/api/advanced_video_routes.py

+@router.post("/publish-event")
+async def publish_video_event(
+    source: str,
+    event_type: str,
+    data: Dict,
+    subject: Optional[str] = None,
+    backend: Optional[str] = None
+):


publish_video_event is documented as accepting a JSON body, but the handler signature uses plain function parameters (source, event_type, data, etc.). In FastAPI this means these values are treated as query parameters, so a JSON POST body like the docs show will not validate. Define a Pydantic request model (or use explicit Body(...)) so the endpoint matches its documented request shape.

Copilot · 2026-02-06T14:53:49Z

src/youtube_extension/backend/api/advanced_video_routes.py

+                type="com.eventrelay.video.analyzed.structured",
+                data=structured_result,
+                subject=request.video_url,
+                schema=json.dumps(request.schema)


The schema= kwarg passed to publisher.publish(...) becomes a CloudEvents extension attribute, not the standard dataschema attribute. If the intent is to populate the CloudEvents dataschema field, pass dataschema=... instead (or rename the kwarg accordingly) so consumers see it in the standard place.

Suggested change

schema=json.dumps(request.schema)

dataschema=json.dumps(request.schema)

Copilot · 2026-02-06T14:53:49Z

src/youtube_extension/backend/api/advanced_video_routes.py

+    """
+    Manually publish a video analysis event as a CloudEvent.
+
+    Supports multiple backends:
+    - pubsub: Google Cloud Pub/Sub
+    - http: HTTP webhook
+    - openwhisk: Apache OpenWhisk trigger
+    - file: Local file (for testing)
+
+    Example:


This adds a manual /publish-event API that can emit CloudEvents without going through the core “YouTube URL → context → agents → outputs” flow. If that’s not intended to be a public workflow, consider scoping it behind an internal flag/auth, moving it under a debug/test router, or removing it to avoid creating an alternate trigger path.

Copilot · 2026-02-06T14:53:52Z

src/integration/cloudevents_publisher.py

+            future = self._pubsub_client.publish(
+                self._topic_path,
+                data,
+                **attributes
+            )
+            message_id = future.result()
+            logger.info(f"Published CloudEvent {event.id} to Pub/Sub: {message_id}")


_publish_pubsub is async but calls future.result() directly, which blocks the event loop and can stall the API under load. Offload the blocking wait to a thread/executor (or don’t wait synchronously for the publish ack) to keep the async path non-blocking.

Copilot · 2026-02-06T14:53:52Z

tests/unit/test_cloudevents_publisher.py

+
+import json
+import os
+from datetime import datetime, timezone


timezone is imported but not used in this test module. With the repo’s Ruff configuration (F401/F841 enabled for tests), this will fail lint. Remove the unused import (or use it explicitly) to keep CI green.

Suggested change

from datetime import datetime, timezone

from datetime import datetime

Copilot · 2026-02-06T14:53:53Z

examples/complete_workflow_example.py

+"""
+
+import asyncio
+import json


Import of 'json' is not used.

Suggested change

import json

Copilot · 2026-02-06T14:53:53Z

examples/complete_workflow_example.py

+
+import asyncio
+import json
+import os


Import of 'os' is not used.

Suggested change

import os

Copilot · 2026-02-06T14:53:53Z

src/integration/temporal_video_analysis.py

+            if isinstance(result.summary, str) and result.summary.strip().startswith("{"):
+                return json.loads(result.summary)
+        except json.JSONDecodeError:
+            pass


'except' clause does nothing but pass and there is no explanatory comment.

Suggested change

pass

logger.warning("Could not parse comparison JSON from Gemini summary")

Initial plan

a5997c3

Copilot AI assigned Copilot and groupthinking Jan 28, 2026

Copilot AI requested a review from groupthinking January 28, 2026 06:36

Copilot stopped work on behalf of groupthinking due to an error January 28, 2026 06:36
The session was cancelled by the user.

vercel bot had a problem deploying to Preview – event-relay-web January 28, 2026 06:36 Failure

vercel bot had a problem deploying to Preview – v0-uvai January 28, 2026 06:36 Failure

Copilot started work on behalf of groupthinking January 28, 2026 06:46 View session

Add temporal Gemini video analysis and CloudEvents publishing pipeline

9ae7b0b

Co-authored-by: groupthinking <154503486+groupthinking@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Integrate Gemini API with Apache EventMesh for video ingestion~~ Add temporal Gemini video analysis and CloudEvents publishing pipeline Jan 28, 2026

Copilot finished work on behalf of groupthinking January 28, 2026 06:55

vercel bot had a problem deploying to Preview – v0-uvai January 28, 2026 06:56 Failure

vercel bot had a problem deploying to Preview – event-relay-web January 28, 2026 06:56 Failure

Copilot AI mentioned this pull request Jan 29, 2026

No action taken: PR closure requires direct GitHub API access #21

Closed

groupthinking approved these changes Feb 6, 2026

View reviewed changes

groupthinking marked this pull request as ready for review February 6, 2026 14:43

Copilot AI review requested due to automatic review settings February 6, 2026 14:43

Copilot started reviewing on behalf of groupthinking February 6, 2026 14:44 View session

github-advanced-security bot found potential problems Feb 6, 2026

View reviewed changes

sentry bot reviewed Feb 6, 2026

View reviewed changes

groupthinking merged commit f12972d into main Feb 6, 2026
16 of 21 checks passed

groupthinking deleted the copilot/link-gemini-api-apache-eventmesh branch February 6, 2026 14:52

Copilot AI reviewed Feb 6, 2026

View reviewed changes

@@ -180,7 +180,8 @@
                         elif self.backend == "file":
                             return await self._publish_file(event)
                         else:
-                            logger.error(f"Unsupported backend: {self.backend}")
+                            safe_backend = str(self.backend).replace("\r", "").replace("\n", "")
+                            logger.error(f"Unsupported backend: {safe_backend}")
                             return None
                     except Exception as e:
                         logger.error(f"Failed to publish CloudEvent: {e}", exc_info=True)

@@ -281,8 +281,10 @@
                         )
                         response.raise_for_status()
+                        # Sanitize trigger name before logging to prevent log injection
+                        safe_trigger_name = trigger_name.replace("\r", "").replace("\n", "")
                         logger.info(
-                            f"Published CloudEvent {event.id} to OpenWhisk trigger: {trigger_name}"
+                            f"Published CloudEvent {event.id} to OpenWhisk trigger: {safe_trigger_name}"
                         )
                         return event.id
                     except Exception as e:

@@ -168,7 +168,8 @@
                         **kwargs
                     )
-                    logger.info(f"Publishing CloudEvent: type={type}, id={event.id}")
+                    safe_type = str(type).replace("\r", "").replace("\n", "")
+                    logger.info(f"Publishing CloudEvent: type={safe_type}, id={event.id}")
                     try:
                         if self.backend == "pubsub":

	schema=json.dumps(request.schema)
	dataschema=json.dumps(request.schema)

	from datetime import datetime, timezone
	from datetime import datetime

	pass
	logger.warning("Could not parse comparison JSON from Gemini summary")

Conversation

Copilot AI commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Check failure

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Copilot Autofix

Check failure

Uh oh!

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

Check notice

Copilot Autofix

sentry bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

sentry bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

sentry bot Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jan 28, 2026 •

edited

Loading

vercel bot commented Jan 28, 2026 •

edited

Loading